Towards an intonation module for a portuguese TTS system
نویسندگان
چکیده
In this paper, a correlation between the linguistic structure of the written text and the real intonation behavior of the read speech in European Portuguese language (EP) is presented. It is our belief that intonation behavior in EP can be strongly predicted from two main coordinates: the syntactic structure of the sentence and its pragmatic communicative function, in one way, combined with the phonological and syntactic nature of the words, in the other way. The purpose of our work is to identify in real speech the main intonation elements, which are relevant to speech naturalness as well as to analyze the factors that determine them. This work addresses the cases of declarative/imperative, interrogative and enumerative phrases. Basic categorizations of the intonation elements, in correlation with the underlying factors are presented. General regularities and correlations as well as the resulting rules, that may be a starting point for practical implementation of an intonation module, are presented and demonstrated, under a Fujisaki’s phonetic/physiological approach. The methodology was based on the observation and modeling of a significant prosodic corpus where different intonation patterns occur in a diversity of text structures. It is our goal to contribute with practical techniques and experience in order to perform a more accurate intonation modeling of Text-to-Speech (TTS) applications, using a rule-based approach.
منابع مشابه
Modeling of intonation bearing emphasis for TTS-synthesis of greek dialogues
TTS-synthesis of neutral style Greek with good intelligibility and quality has been achieved some time ago. As a further step towards expanding the applications domain of the TTS-system developed in our laboratory, the incorporation of emphasis into speech used in man-machine dialogues according to their context has been studied recently. In this paper the method applied for the analysis of int...
متن کاملMaximum-likelihood dynamic intonation model for concatenative text-to-speech system
In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...
متن کاملHomograph ambiguity resolution in front-end design for portuguese TTS systems
In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-ofspeech, and a semantic analyzer, used to disambiguate homographs which belong to the same part-of-speech. The proposed algorithms are meant to solve a significant part of homogr...
متن کاملMicrosoft Mulan - a bilingual TTS system
This paper describes a bilingual text-to-speech (TTS) system, Microsoft Mulan, which switches between Mandarin and English smoothly and which maintains the sentence level intonation even for mixed-lingual texts. Mulan is constructed on the basis of the Soft Prediction Only prosodic strategy and the Prosodic-Constraint Orient unit-selection strategy. The unitselection module of Mulan is shared a...
متن کاملComparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...
متن کامل